Skip to content

fix: 5 critical bugs — TEE randomization, HTTP 402 hint, int dtype, inference None crash, gas estimation#201

Open
amathxbt wants to merge 4 commits intoOpenGradient:mainfrom
amathxbt:fix/critical-5-bugs-tee-402-dtype-inference-gas
Open

fix: 5 critical bugs — TEE randomization, HTTP 402 hint, int dtype, inference None crash, gas estimation#201
amathxbt wants to merge 4 commits intoOpenGradient:mainfrom
amathxbt:fix/critical-5-bugs-tee-402-dtype-inference-gas

Conversation

@amathxbt
Copy link
Contributor

Summary

This PR fixes 5 critical bugs identified from open issues and code audit, spanning TEE selection, LLM error handling, type conversions, inference safety, and gas estimation.


Bug 1 — TEE always picks the same node (closes #200)

File: src/opengradient/client/tee_registry.py

Root cause: get_llm_tee() always returned tees[0], routing 100% of traffic to a single TEE with zero load distribution or failover.

Fix: Replace tees[0] with random.choice(tees) so each LLM() construction independently selects from the full pool of active, registry-verified TEEs.

# Before
return tees[0]

# After
selected = random.choice(tees)
logger.debug("Selected TEE %s from %d active LLM proxy TEE(s)", selected.tee_id, len(tees))
return selected

Bug 2 — HTTP 402 swallowed as cryptic RuntimeError (closes #188)

File: src/opengradient/client/llm.py

Root cause: All HTTP errors (including 402 Payment Required) were caught by except Exception and re-raised as a generic RuntimeError("TEE LLM chat failed: ..."). Users had no idea they needed to call ensure_opg_approval() first.

Fix: Add import httpx and intercept httpx.HTTPStatusError before the generic handler. When status_code == 402, raise a RuntimeError with an explicit, actionable message pointing to ensure_opg_approval(). Applied to _chat_request(), completion(), and _parse_sse_response() (streaming path).

# New constant
_402_HINT = (
    "Payment required (HTTP 402): your wallet may have insufficient OPG token allowance. "
    "Call llm.ensure_opg_approval(opg_amount=<amount>) to approve Permit2 spending "
    "before making requests. Minimum amount is 0.05 OPG."
)

# In _chat_request / completion / _parse_sse_response:
except httpx.HTTPStatusError as e:
    if e.response.status_code == 402:
        raise RuntimeError(_402_HINT) from e
    raise RuntimeError(f"TEE LLM chat failed: {e}") from e

Bug 3 — Fixed-point int returns np.float32 instead of int (partially closes #103)

File: src/opengradient/client/_conversions.py

Root cause: convert_to_float32(value, decimals) always returned np.float32, even when decimals == 0 (i.e., the original value was an integer). Users expecting integer outputs received np.float32 and had to cast manually.

Fix: Add convert_fixed_point_to_python(value, decimals) that returns int when decimals == 0 and np.float32 otherwise. np.array() then infers the correct dtype (int64 vs float32) automatically. The old convert_to_float32 is kept as a deprecated alias for backward compatibility.

def convert_fixed_point_to_python(value: int, decimals: int) -> Union[int, np.float32]:
    if decimals == 0:
        return int(value)      # ← integer tensor stays integer
    return np.float32(Decimal(value) / (10 ** Decimal(decimals)))

Bug 4 — infer() crashes with AttributeError when inference node returns None

File: src/opengradient/client/alpha.py

Root cause: _get_inference_result_from_node() returns None when the node has no result yet. This None was passed directly to convert_to_model_output(), which calls event_data.get("output", {})AttributeError: NoneType object has no attribute get. Additionally, if ModelInferenceEvent logs were empty, parsed_logs[0] caused an IndexError.

Fix:

# Guard 1: empty precompile logs
if not precompile_logs:
    raise RuntimeError("ModelInferenceEvent not found in transaction logs.")

# Guard 2: None from inference node
if inference_result is None:
    raise RuntimeError(
        f"Inference node returned no result for inference ID {inference_id!r}. "
        "The result may not be available yet — retry after a short delay."
    )

Bug 5 — run_workflow() uses hardcoded 30M gas

File: src/opengradient/client/alpha.py

Root cause: run_workflow() hardcoded gas=30000000, unlike infer() which correctly uses estimate_gas(). This is both wasteful (users overpay) and fragile on networks with lower block gas limits.

Fix: Call estimate_gas() first and multiply by 3 for headroom. Fall back to 30M only if estimation itself fails.

try:
    estimated_gas = run_function.estimate_gas({"from": self._wallet_account.address})
    gas_limit = int(estimated_gas * 3)
except Exception:
    gas_limit = 30000000  # fallback

Also replaces the hardcoded timeout=60 in new_workflow() with the INFERENCE_TX_TIMEOUT constant.


Files Changed

File Bugs Fixed
src/opengradient/client/tee_registry.py Bug 1 (TEE randomization)
src/opengradient/client/llm.py Bug 2 (HTTP 402 hint)
src/opengradient/client/_conversions.py Bug 3 (int dtype)
src/opengradient/client/alpha.py Bugs 4 & 5 (None guard + gas)

Related Issues

Closes #200
Closes #188
Partially closes #103

Previously get_llm_tee() always returned tees[0], the first TEE in the
registry list. This caused all clients to hit the same TEE, providing no
load distribution and no resilience when that TEE starts failing.

Fix: use random.choice(tees) so each LLM() construction independently
picks from all currently active TEEs. Successive retries or re-initializations
will therefore naturally spread across the healthy pool.

Closes OpenGradient#200
Previously any HTTP error from the TEE (including 402 Payment Required)
was silently wrapped into a generic RuntimeError("TEE LLM chat failed: ...").
This caused the confusing traceback seen in issue OpenGradient#188 — the real cause
(insufficient OPG allowance) was buried inside the exception message.

Fix:
- Add `import httpx` and intercept httpx.HTTPStatusError before the
  generic `except Exception` handler in both _chat_request() and
  completion().
- When status == 402, raise a RuntimeError with a clear, actionable hint
  telling the user to call llm.ensure_opg_approval(opg_amount=<amount>).
- Also handle 402 in _parse_sse_response() for the streaming path.
- All other HTTP errors continue to surface as before.

Closes OpenGradient#188
Previously convert_to_float32() always returned np.float32 regardless of
the decimals field, forcing callers to manually cast integer results
(issue OpenGradient#103 — add proper type conversions from Solidity contract to Python).

Fix:
- Add convert_fixed_point_to_python(value, decimals) that returns int when
  decimals == 0 and np.float32 otherwise.  np.array() automatically picks
  the correct dtype (int64 vs float32) based on the element types.
- Update both convert_to_model_output() and convert_array_to_model_output()
  to call the new function.
- Keep convert_to_float32() as a deprecated backward-compatible alias so
  any external code that imports it directly continues to work.

Partially closes OpenGradient#103
… in run_workflow

Bug 4 — None crash in infer():
  _get_inference_result_from_node() can legitimately return None when the
  inference node has no result yet.  Previously this None was passed
  directly to convert_to_model_output(), which calls event_data.get(),
  causing an AttributeError: 'NoneType' object has no attribute 'get'.

  Fix: after calling _get_inference_result_from_node(), check for None
  and raise a clear RuntimeError with a human-readable message and the
  inference_id so callers know what to retry.

  Also guard the precompile log fetch: if parsed_logs is empty, raise
  RuntimeError instead of crashing with an IndexError on parsed_logs[0].

Bug 5 — hardcoded 30M gas in run_workflow():
  run_workflow() built the transaction with gas=30000000 (30 million)
  unconditionally.  This is wasteful (users overpay for gas) and can
  fail on networks with a lower block gas limit.

  Fix: call estimate_gas() first (consistent with infer()), then multiply
  by 3 for headroom.  Fall back to 30M only if estimation itself fails.

Also fixes new_workflow() deploy to use INFERENCE_TX_TIMEOUT constant
instead of the previous hardcoded 60 seconds.
@amathxbt
Copy link
Contributor Author

Hey @adambalogh 👋 — this PR addresses 5 critical bugs found during a code audit, 3 of which directly close or partially close open issues you and the team have filed:

# Bug Issue
1 get_llm_tee() always returned tees[0] — no randomization or load distribution Closes #200
2 HTTP 402 from TEE was swallowed as a cryptic RuntimeError with no guidance Closes #188
3 Fixed-point values with decimals=0 returned np.float32 instead of int Partially closes #103
4 infer() crashed with AttributeError when the inference node returned None Code audit
5 run_workflow() hardcoded gas=30000000 instead of using estimate_gas() Code audit

All changes are backward-compatible. Happy to adjust anything before review! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant